Scalable Nonparametric Multiway Data Analysis
نویسندگان
چکیده
Multiway data analysis deals with multiway arrays, i.e., tensors, and the goal is twofold: predicting missing entries by modeling the interactions between array elements and discovering hidden patterns, such as clusters or communities in each mode. Despite the success of existing tensor factorization approaches, they are either unable to capture nonlinear interactions, or computationally expensive to handle massive data. In addition, most of the existing methods lack a principled way to discover latent clusters, which is important for better understanding of the data. To address these issues, we propose a scalable nonparametric tensor decomposition model. It employs Dirichlet process mixture (DPM) prior to model the latent clusters; it uses local Gaussian processes (GPs) to capture nonlinear relationships and to improve scalability. An efficient online variational Bayes Expectation-Maximization algorithm is proposed to learn the model. Experiments on both synthetic and real-world data show that the proposed model is able to discover latent clusters with higher prediction accuracy than competitive methods. Furthermore, the proposed model obtains significantly better predictive performance than the state-of-the-art large scale tensor decomposition algorithm, GigaTensor, on two large datasets with billions of entries.
منابع مشابه
Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis
Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches—such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)—amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g.missing data and binary data), and (iii) noisy observations and ...
متن کاملScalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors
We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations. The key issue of pre-specifying the rank of the decomposition is sidestepped in a principled manner using a multiplicative gamma process prior. Both continuous and binary data can be analyzed under the framework, in a coherent way using fully conjugate Bayesian analysis. In par...
متن کاملInfTucker: t-Process based Infinite Tensor Decomposition
Tensor decomposition is a powerful tool for multiway data analysis. Many popular tensor decomposition approaches—such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)—conduct multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g. missing data and binary data), and (iii) noisy observations and outliers. To ad...
متن کاملMultiway Regularized Generalized Canonical Correlation Analysis
Regularized Generalized Canonical Correlation Analysis (RGCCA) is currently geared for the analysis two-way data matrix. In this paper, multiway RGCCA (MGCCA) extends RGCCA to the multiway data configuration. More specifically, MGCCA aims at studying the complex relationships between a set of three-way data table.
متن کاملCLAM: Connection-less, Lightweight, and Multiway Communication Support for Distributed Computing
A number of factors motivate and favor the implementation of communication protocols in user-space. There is a particularly strong motivation for the provision of scalable, multiway and connectionless transport for distributed computing, multimedia, and conferencing applications. This is also true of high speed networking, where it is bene-cial to keep the OS kernel out of the critical path in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015